Picture for Zehui Chen

Zehui Chen

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Add code
Jan 08, 2026
Viaarxiv icon

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Add code
Sep 10, 2025
Viaarxiv icon

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

Add code
May 28, 2025
Viaarxiv icon

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Add code
Apr 10, 2025
Viaarxiv icon

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Add code
Feb 25, 2025
Viaarxiv icon

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Add code
Jan 20, 2025
Viaarxiv icon

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

Add code
Jan 07, 2025
Viaarxiv icon

LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation

Add code
Nov 19, 2024
Figure 1 for LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation
Figure 2 for LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation
Figure 3 for LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation
Figure 4 for LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation
Viaarxiv icon